This project involves a Monte Carlo simulation study using data generated in R to investigate the properties of estimators and confidence intervals.
The primary goal is to compare different confidence interval procedures and how successful they are in capturing p, the probability of success from a Binomial random sample.
To conduct inference on p, this report has focused on six different methods for generating confidence intervals, including:
Which intervals should we use in practice? We really need to understand what good properties of a confidence interval are, such as:
By focusing on these two properties we can compare the performance of these intervals for making inference from various combinations of p and n using a simulation with rbinom() to generate data for our comparison.
rbinom(1500, size = 15, prob = 0.01) does the trick for one combination of n = 15 and p = 0.01!The entire simulation procedure and resulting calculations for all combinations of n and p, where n = 1 to 200 and p varies from 0.01 to 0.99 in increments of 0.01, takes around 10 hours (19,800 possible combinations).
A GIF (Graphical Interchange Format) is a small collection of images that can serve as a powerful tool for bringing data to life, especially when animating a time series or other iterative type of data set.
When paired together, the ggplot2 and gganimate packages offer a simple method for transforming static plots into animated GIFs!
library(ggplot2)
library(gganimate)
The simulation results from Project 1 containing all 19,800 possible combinations of n and p have been saved in a .RData file. Begin the GIF creation process by saving this file locally, and then loading it into your R session using code similar to the below.
# Set working directory and load stored data generated from the simulation
setwd("~/ST502/Projects/Project1/gif")
load("binom95CI.RData")
The binom95CI.RData file contains three data sets:
dataSet - table of results for all 19,800 combinations of n and pplotContain - table of results in plot-friendly format for proportion of intervals that capture the true value of pplotLength - table of results in plot-friendly format for the average length of intervalsFor context, a typical static plot can be created with the ggplot2 package. An example has been provided below using subsets of plotContain and plotLength where n = 15.
# Create plot for Proportion Contained
## Optional: Use data=subset(plotContain,Method=="") to plot specific methods
lineContain <- ggplot(data=subset(plotContain,n==15),aes(x=p,y=value,color=Method))
lineContain <- lineContain +
geom_line(size=1) +
scale_y_continuous(breaks = c(0,.2,.4,.6,.8,.95), limits = c(0,1)) +
facet_wrap(vars(Method),labeller = label_both) +
labs(x="p",y="Proportion Contained") +
theme_bw() +
ggtitle("Proportion Contained @ n: 15")
lineContain
# Create plot for Average Interval Length
## Optional: Use data=subset(plotLength,Method=="") to plot specific methods
lineLength <- ggplot(data=subset(plotLength,n==15),aes(x=p,y=value,color=Method))
lineLength <- lineLength +
geom_line(size=1) +
scale_y_continuous(breaks = c(0,.2,.4,.6,.8,1), limits = c(0,1)) +
labs(x="p",y="Average Interval Length") +
theme_bw() +
ggtitle("Average Interval Length @ n: 15")
lineLength
What if we wanted to study the results of the other possible values of n without having to manually create multiple charts? Essentially, how can we illustrate the impact from an increase in sample size, n?
By generating a GIF with the help of gganimate!
By introducing a few additional lines of code when defining the attributes of our ggplot object, we can utilize the power of the gganimate package to render images into a GIF animation for all desired values of n.
The animate() function is used for customizing various properties of the generated GIF.
Additionally, any created GIF can be saved locally for future sharing over email, discussion boards, etc with the anim_save() function.
Rendering a GIF can sometimes take several minutes depending on the number of needed frames and individual plots that have to be created. For an example of a shorter GIF that can still convey key trends as n increases, try a subset of the plot data where n goes from 1 to 50.
# Create GIF for Proportion Contained
lineContain50 <- ggplot(data=subset(plotContain,n<=50),aes(x=p,y=value,color=Method))
lineContain50 <- lineContain50 +
geom_line(size=1) +
scale_y_continuous(breaks = c(0,.2,.4,.6,.8,.95), limits = c(0,1)) +
facet_wrap(vars(Method),labeller = label_both) +
labs(x="p",y="Proportion Contained") +
theme_bw() +
# gganimate code below
ggtitle("Proportion Contained @ n: {frame_time}") +
transition_time(n) + # Generates frame_time var used in ggtitle()
ease_aes("linear") +
enter_fade() + # Optional
exit_fade() # Optional
# Animate the ggplot object
animate(lineContain50,width=960,height=540)
# Set filename and save most recent animation to current working directory
anim_save("Contain_nTo50.gif")
# Create GIF for Average Interval Length
lineLength50 <- ggplot(data=subset(plotLength,n<=50),aes(x=p,y=value,color=Method))
lineLength50 <- lineLength50 +
geom_line(size=1) +
scale_y_continuous(breaks = c(0,.2,.4,.6,.8,1), limits = c(0,1)) +
labs(x="p",y="Average Interval Length") +
theme_bw() +
# gganimate code below
ggtitle("Average Interval Length @ n: {frame_time}") +
transition_time(n) + # Generates frame_time var used in ggtitle()
ease_aes("linear") +
enter_fade() + # Optional
exit_fade() # Optional
# Animate the ggplot object
animate(lineLength50,width=960,height=540)
# Set filename and save most recent animation to current working directory
anim_save("Length_nTo50.gif")
For further analysis, the entire simulation can be illustrated as n goes from 1 to 200.
Important Notes:
subset() function has been removed from the ggplot(data=) declaration, and all values of n are rendered.nframes= has been declared in the animate() function to force the creation of 200 frames within the animation, one for each value of n. Without this option, gganimate could create an animation that skips every other value of n, resulting in a less-desirable, choppy animation.# Create GIF for Proportion Contained
lineContain200 <- ggplot(data=plotContain,aes(x=p,y=value,color=Method))
lineContain200 <- lineContain200 +
geom_line(size=1) +
scale_y_continuous(breaks = c(0,.2,.4,.6,.8,.95), limits = c(0,1)) +
facet_wrap(vars(Method),labeller = label_both) +
labs(x="p",y="Proportion Contained") +
theme_bw() +
# gganimate code below
ggtitle("Proportion Contained @ n: {frame_time}") +
transition_time(n) + # Generates frame_time var used in ggtitle()
ease_aes("linear") +
enter_fade() + # Optional
exit_fade() # Optional
# Animate the ggplot object
animate(lineContain200,width=960,height=540,nframes=200)
# Set filename and save most recent animation to current working directory
anim_save("Contain_nTo200.gif")
# Create GIF for Average Interval Length
lineLength200 <- ggplot(data=plotLength,aes(x=p,y=value,color=Method))
lineLength200 <- lineLength200 +
geom_line(size=1) +
scale_y_continuous(breaks = c(0,.2,.4,.6,.8,1), limits = c(0,1)) +
labs(x="p",y="Average Interval Length") +
theme_bw() +
# gganimate code below
ggtitle("Average Interval Length @ n: {frame_time}") +
transition_time(n) + # Generates frame_time var used in ggtitle()
ease_aes("linear") +
enter_fade() + # Optional
exit_fade() # Optional
# Animate the ggplot object
animate(lineLength200,width=960,height=540,nframes=200)
# Set filename and save most recent animation to current working directory
anim_save("Length_nTo200.gif")
What trends and observations are more apparent from the animations compared to the static plot images? Enjoy!